Data-Driven Morphological Analysis and Disambiguation for Morphologically Rich Languages and Universal Dependencies
نویسندگان
چکیده
Parsing texts into universal dependencies (UD) in realistic scenarios requires infrastructure for morphological analysis and disambiguation (MA&D) of typologically different languages as a first tier. MA&D is particularly challenging in morphologically rich languages (MRLs), where the ambiguous space-delimited tokens ought to be disambiguated with respect to their constituent morphemes. Here we present a novel, language-agnostic, framework for MA&D, based on a transition system with two variants, word-based and morpheme-based, and a dedicated transition to mitigate the biases of variable-length morpheme sequences. Our experiments on a Modern Hebrew case study outperform the state of the art, and we show that the morpheme-based MD consistently outperforms our word-based variant. We further illustrate the utility and multilingual coverage of our framework by morphologically analyzing and disambiguating the large set of languages in the UD treebanks.
منابع مشابه
Parsing Morphologically Rich Languages with (Mostly) Off-The-Shelf Software and Word Vectors
As a contribution to the 2014 SPMRL shared task on parsing morphologically rich languages, we show that it is now possible to achieve high dependency accuracy using existing parsers without the need for intricate multi-parser schemes even if only small amounts of training data are available. We further show that the impact of using word vectors on parsing quality heavily depends on the amount o...
متن کاملMorphological and Syntactic Case in Statistical Dependency Parsing
Most morphologically rich languages with free word order use case systems to mark the grammatical function of nominal elements, especially for the core argument functions of a verb. The standard pipeline approach in syntactic dependency parsing assumes a complete disambiguation of morphological (case) information prior to automatic syntactic analysis. Parsing experiments on Czech, German, and H...
متن کاملTesting the Effect of Morphological Disambiguation in Dependency Parsing of Basque
This paper presents a set of experiments performed on parsing Basque, a morphologically rich and agglutinative language, studying the effect of using the morphological analyzer for Basque together with the morphological disambiguation module, in contrast to using the gold standard tags taken from the treebank. The objective is to obtain a first estimate of the effect of errors in morphological ...
متن کاملUniversal Stanford dependencies: A cross-linguistic typology
Revisiting the now de facto standard Stanford dependency representation, we propose an improved taxonomy to capture grammatical relations across languages, including morphologically rich ones. We suggest a two-layered taxonomy: a set of broadly attested universal grammatical relations, to which language-specific relations can be added. We emphasize the lexicalist stance of the Stanford Dependen...
متن کاملA Discriminative Model for Joint Morphological Disambiguation and Dependency Parsing
Most previous studies of morphological disambiguation and dependency parsing have been pursued independently. Morphological taggers operate on n-grams and do not take into account syntactic relations; parsers use the “pipeline” approach, assuming that morphological information has been separately obtained. However, in morphologically-rich languages, there is often considerable interaction betwe...
متن کامل